I would say my brother and I have a fairly different taste in music. We both lived with our parents until the end of 2017 approximately, being in close proximity to each other and being exposed to each other’s music on a daily basis. Now we study in two different cities and see each other about once every month. In this portfolio I want to look if our proximity, or lack thereof, has had any influence over our respective tastes in music and how our music tastes differ exactly.
In this first figure the means of five different Spotify features are plotted over the last four years. These numbers were based on our “Top Songs of 20xx” playlists, that are compiled at the end of the year for each Spotify user, and on the features of these playlists that can be extracted with the Spotify API.
Looking at all the features, you can see that for danceability, energy, and acousticness the lines start to diverge in 2017 and start to converge again in 2019. For the other two features, instumentalness and valence, the means are closest together in 2017 and after that seem to diverge.
In this plot I explore the most interesting differences in features between us. In the last plot you could see the means of these features plotted over time, but in this plot you can see the features of all the individual songs.
Instrumentalness: The first thing you will most likely notice is the increase of instrumentalness for my brother Joran. This increase could already be seen in the previous plot, but this plot really puts it in perspective. My brother has always listened to a lot of electronic music, mostly drum & bass, but I think in the last few years, his taste in drum & bass has gone from songs which mostly have vocals to very instrumental, bass-heavy songs. For me, I have started listening to more electronic music, but more on the vocal side, so this isn’t represented by the instrumentalness feature.
Danceability: This feature can be mostly seen in 2016 and 2017, where my brother’s music is shifted slightly to the right and mine slightly to the left, but when plotted it doesn’t seem quite as significant. For both of us it is quite high and I think that all of our music is quite danceable. For him mostly rock, punk and drum & bass and for me mostly hip hop, indie(pop) and also drum & bass.
Valence: As seen in the previous plot, our valence has seemed to switch places. This can be clearly seen in 2018 and 2019, where the valence for most of my songs is higher than most of my brother’s songs. I don’t have a good explanation for this, but it could be that Spotify doesn’t see the hard drum & bass as very positive.
For these graphs I compared the keys of my top songs, my brother’s top songs and for a fair comparison also the first 400 songs of the Dutch Top 2000 playlists. I chose this playlist as it seems like a good general playlist that represents the music taste in The Netherlands.
Surprisingly, the graphs for my brother and I are quite similar, and nothing really stands out except maybe the lack of gaps between G, A and B for my playlist compared to my brothers playlist. To check that this is not just a standard distribution you can look at the top 2000 playlist and see that this has a way different distribution.
When comparing the tempos of our top songs, there is a visible difference. The first thing that you will notice while looking at the graphs is that my brother seems to listen to a lot of songs around the 175 BPM mark. This is very different from the distribution of the top 2000 songs, which can maybe be seen as a sort of average for the Dutch taste. However this peak can easily be explained by my brothers taste for Drum&Bass and other high tempo electronic music, as I also explained earlier.
The distribution of tempo in my top songs is actually quite similar to the top 2000 distribution and while I do also listen to some electronic music, it is a smaller part of my top songs. I would also say that the electronic music I listen to has a lower tempo than my brothers electronic music on average.
For these plots we looked at the chroma and timbre of one song for both of us. These songs were selected from our top 10 songs of 2019 and chosen so that they could show us some interesting differences between different parts of the songs.
The song that I chose from my playlist is Devil In A New Dress by Kanye West and Rick Ross. For the first part of the song it has a repeating beat with Kanye West rapping over it, but about 3 minutes into the song a bridge starts with a guitar solo (which is quite unheard of in hip hop music). This can be seen in both the timbre and chromagram around the 200 mark on the x-axis. After this solo Rick Ross starts his verse, which has a similar timbre and chroma to Kanye’s part. Lastly, the song ends with another guitar parts, which can be clearly seen in the timbre graph, but less clearly in the chroma graph. This is because in the first solo, it is only the guitar, but at the end, the beat is also still playing through the guitar, so the chromagram doesn’t pick it up.
For my brother I chose the song Space Oddity by David Bowie (which to be honest, I wasn’t quite familiar with). The song starts with an instrumental, then has its first verse after which is another instrumental. Then there is a second verse, a chorus, a third verse, again the chorus and lastly another intrumental part. Looking closely at the timbre graph, these different parts of the song can be picked out. There are four instumental parts, which can be seen from the four darker ‘rectangles’ along the diagonal. Between those rectangles are the verses and choruses, of which the choruses have the brightest colours.
In these graphs I look at the keygrams of the same two songs I looked at the chromagrams of, Devil In A New Dress and Space Oddity.
Devil In A New Dress definitely seems to be in F minor. This is most clearly seen during the verses, and is less clear during the bridge in the middle of the song.
Space Oddity has a less clear key and maybe even seems to switch up for a little bit. The song seems to be in Db major, but during the verses it seems to switch to D minor according to the keygram. I don’t think the keys are actually switched during the song, but I can’t give a definitive answer as to what key the song is in by only looking at this graph.
(Just kidding, my mom can keep us apart just fine, only our voices do sound kind of alike)
An interesting question is if the music taste of me and my brother is different enough for a classifier to see the difference. To see this, we will compare three different classification algorithms that are trained on a dataset of 300 of my top songs and 300 of my brothers top songs.
K-nearest neighbors: The algorithm gave some clear results and in the first graph a clear diagonal line can be seen. For a confusion matrix a solid diagonal line means that the classifier has classified everything correctly, so when you see a diagonal it means it has done a pretty good job.Looking at the numbers, the classifier had an accuracy of 62.8%. The baseline for this prediction is 50%, as that is the accuracy you will get if you pick a person at random. So the classifier is better than picking at random, but definitely not perfect. Let’s try to improve this.
Decision trees: Using the decision trees method the accuracy increased to 67.0%. The confusion matrix is shown in the second graph. What is interesting to note is that number of correct predictions for my songs increased, but for my brother it slightly decreased. However this is not a big difference and could change each time the classifier is trained.
Random forests: Lastly, using the random forests classification method, the accuracy increased to 73.8%. This method definitely seems to be the best for this specific dataset and also provides the best looking confusion matrix as seen in the third graph, with an increase for both playlists.
On the next page we will look at which features were the most important for the accuracy of this random forest classifier.
As discussed on the previous page, we will now look at the features that were the most important for the random forest classification algorithm. Our top 5 consists of c06, energy, acousticness, c02 and c01. c06, c02 and c01 are three timbre features and these features show us the characteristic quality of sound, independent of pitch or volume. You could see this as the kind of instrumentation. Unfortunately it is hard to say wat these three features are exactly, but it does show us that the top songs of my brother and I do differ a lot in instrumentation.
Energy and acousticness are the other two features that are very important for the classification. Looking back at our first plot, acousticness does seem to differ a lot for the two of us, with the greatest difference in mean being in 2017. However, the mean energy does not seem to differ a lot for us. There is definitely a difference, with my music having a higher energy than my brother’s, but just looking at the first plot other features, like danceability, seem to have a much bigger difference. But, looking at the boxplots of these two features, you suddenly see why energy was chosen over danceability and that means can be misleading.